Corpus-trained Text Generation for Summarization
نویسندگان
چکیده
We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. We applied this methodology to an annotated corpus of indicative summaries to derive constraint rules that can assist in generating summaries for new, unseen material.
منابع مشابه
Crowd-Sourced Iterative Annotation for Narrative Summarization Corpora
We present an iterative annotation process for producing aligned, parallel corpora of abstractive and extractive summaries for narrative. Our approach uses a combination of trained annotators and crowd-sourcing, allowing us to elicit human-generated summaries and alignments quickly and at low cost. We use crowd-sourcing to annotate aligned phrases with the text-to-text generation techniques nee...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملSelf-Supervised Learning for Automatic Text Summarization by Text-span Extraction
We describe a system for automatic text summarization that operates by extracting the most relevant sentences from documents with regard to a query. The lack of labeled corpora makes it difficult to develop automatic techniques for summarization. We propose to use a self-supervised method which does not rely on the availability of labeled corpora for learning to rank sentences for the summary. ...
متن کاملExtractive vs. NLG-based Abstractive Summarization of Evaluative Text: The Effect of Corpus Controversiality
Extractive summarization is the strategy of concatenating extracts taken from a corpus into a summary, while abstractive summarization involves paraphrasing the corpus using novel sentences. We define a novel measure of corpus controversiality of opinions contained in evaluative text, and report the results of a user study comparing extractive and NLG-based abstractive summarization at differen...
متن کاملLCSTS: A Large Scale Chinese Short Text Summarization Dataset
Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set. Due to the great challenge of constructing the large scale summaries for full text, in this paper, we introduce a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which will be ...
متن کامل